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Abstract 

Linear inverse problems are very common in signal and image processing. Many algorithms that aim at solving such 
problems include unknown parameters that need tuning. In this work we focus on optimally selecting such parameters 
in iterative shrinkage methods for image deblurring and image zooming. Our work uses the projected Generalized 
Stein Unbiased Risk Estimator (GSURE) for determining the threshold value A and the iterations number K in these 
algorithms. The proposed parameter selection is shown to handle any degradation operator, including ill-posed and 
even rectangular ones. This is achieved by using GSURE on the projected expected error We further propose an 
efficient greedy parameter setting scheme, that tunes the parameter while iterating without impairing the resulting 
deblurring performance. Finally, we provide extensive comparisons to conventional methods for parameter selection, 
showing the superiority of the use of the projected GSURE. 

Keywords: Iterated Shrinkage, Stein Unbiased Risk Estimator, Separable Surrogate Function, Inverse problems. 



1. Introduction 

In many applications in signal and image processing there is a need for solving a linear inverse problem f?, Q]- 
Here we consider the scenario in which an original image x is degraded by a linear operator H followed by an additive 
white Gaussian noise w. Our observation can be written as 

y = Hx + w, (1) 

where the goal is to reconstruct the original image x from y. In our setting, there is no Bayesian prior on x; it is 
therefore treated as a deterministic unknown. 

There are many techniques in the literature for this task that focus on different objectives dUlSBSSHIilS 
lol, n |. Most of these methods include parameters that require tuning. Some of the algorithms are iterative, and thus 



the number of iterations needs to be set as well. Tuning of such parameters is generally not a trivial task. Very often, 
the objective in these problems is recovery of the signal with a minimal Mean-Squared Error (MSE) between the 
original image and the estimated one [12]. In this case, ideally, the parameters should be chosen such that this MSE is 
minimized. However, since the original image x is deterministic, the MSE will depend in general on the unknown x. 
Consequently, we cannot know what choice of parameters minimizes the true MSE. In practice, therefore, a common 
approach to parameter tuning is still manual, by looking at the reconstructed result and judging its quality subjectively. 

The literature offers several automatic ways for choosing the parameters of the reconstruction algorithm One 
such method is the Unbiased Predictive Risk Estimator (UPRE) 1 13], which is valid only for the case that the recon- 
struction operator is linear. Other popular tools are the generalized cross validation (GCV) technique with its many 



variants Ill4tl . and the L-Curve method Ill5ll . both available for general reconstruction algorithms. The number of 
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iterations can be chosen using the discrepancy principle 1 16], such that the norm of the error between the degraded 
image and reconstructed image is close to the noise variance. 

An alternative technique for parameter tuning is the Stein Unbiased Risk Estimator (SURE) llTl llSll . SURE 
provides an unbiased estimate of the MSE for a candidate solver, including non-linear ones. Minimizing this estimated 
MSE, leads to the automatic tuning desired. The original SURE by Stein is limited to the case of denoising of 
white additive Gaussian noise. One of the best known algorithms that uses SURE is Donoho's SureShrink denoising 
algorithm lfT9ll . There are also other contributions a long these lines, as in 21 , 221. 

Recently, a generalization of SURE (GSURE) |23] has been developed for more general models that can be de- 
scribed in the form of exponential family distribution. This provides a new opportunity for choosing the parameters 
automatically in a much more diverse set of inverse problems. In the case where H is rank-deficient, or even rectan- 
gular (with more columns than rows), the inverse problem becomes ill-posed. In such a scenario, GSURE can at best 
estimate the MSE in the range of H^. We refer hereafter to a tuning of the algorithm's parameters with this estimation 
as the projected GSURE. 



The reconstruction algorithms used in our work belong to the iterated shrinkage family of methods iTllSL 124 1251] . 
In these algorithms, the image is known to have a sparse representation over a dictionary. Reconstruction is obtained 
by an iterative process composed of applying the dictionary and the degradation operator with their conjugates, in 
addition to a point-wise shrinkage operation. In these algorithms the threshold value A in the shrinkage parameter 
needs to be tuned, along with the number of iterations K to be performed. 

The work reported in ll26ll develops a special case of GSURE for the case where the blur operator H is assumed to 
be full-rank. This work also addresses automatic tuning of the parameters A and K in an iterated shrinkage algorithm, 
used for the deblurring problem. For a pre-chosen number of iterations, the GSURE is used to select a fixed value 
of A that optimizes the overall estimated MSE. For a given A, the GSURE is again used to determine the number of 
iterations. We refer to this method of setting A and K as the global method. 

Similar to the work in |26], this paper proposes to tune A and K for iterated shrinkage algorithms, which are 
geared towards solving inverse problems in image processing. Our work extends the applicability of GSURE to ill- 
posed inverse problems, using the projected GSURE. We show that turning from the plain GSURE to its projected 
version, the performance of the parameter tuning improves substantially. 

In addition, we propose an alternative to the global method mentioned above, where the value of A is chosen by 
minimizing the estimated MSE at each iteration, thereby obtaining a different value of A per iteration. We refer to 
this technique as the greedy method. The main benefit of such a strategy is the natural way with which the number 
of iterations is set simultaneously - the iterations can be stopped when the estimated MSE improvement is under a 
certain threshold. In order to further improve the performance with regard to the overall estimated MSE in this greedy 
approach, A in each iteration can be chosen with a look-ahead on the estimated MSE in the next few iterations. 

This paper is organized as follows: In Section|2]we present the deblurring and image scaUng (zooming) problems, 
and the iterative shrinkage algorithms used for solving them. Section[3]describes several conventional techniques for 
parameter setting (GCV, L-curve, and the discrepancy method). Section |4]presents the SURE with its generalization, 
as developed in 112 311 . In Section |5] we use GSURE for the task of parameter selection in the iterative shrinkage 
algorithms using the two approaches - global and greedy. The results of these methods are presented in Section|6] We 
conclude in Section|7]with a summary of this work, and a discussion on future research directions. 



2. Iterative Shrinkage Methods 

To define a Unear inverse problems, we need to specify the model of the original image, the noise characteristics, 
and the degradation operator We consider two types of operators H. The first is a blur operator, and the second is a 
scale-down operator that defines a scale-up problem. In the blur case, H is a square matrix that blurs each pixel by 
replacing it with a weighted average of its neighbors. In the scale-down scenario, H is composed of two operators 
applied sequentially, the first blurs the image and the second down-samples it. In both settings we assume that the 
operators are known, and the additive noise w is white Gaussian with known variance en. Our goal in both settings is 
to reconstruct the original image x without assuming any Bayesian prior. 

In order to evaluate the quality of the reconstruction, we measure the MSE between the reconstructed image and 
the original one, 

MSE(x) = £[||x-x||2]. (2) 
2 



This criterion is very commonly used in image processing despite its known shortcomings 112711 . Two related measures 
that are also common are the Peak Signal to Noise Ratio (PSNR), and the Improvement in Signal to Noise Ratio 
(ISNRfl which are both inversely proportional to the MSE. Thus, all of the criteria favor low MSE. 

Since the goodness of our fit is measured by the MSE, a natural approach would be to choose an estimate x 
that directly minimizes the MSE. Unfortunately, however, the MSE depends in general on x and therefore cannot be 
minimized directly. This difficulty is explained in more detail in |,1Z1 . To avoid direct MSE minimization, in the 



context of imaging it is often assumed that x has a sparse representation a under a given dictionary D 112811 . This 
knowledge of x is then used together with a standard least-squares criterion to find an estimate x. More specifically, a 
popular approach to recover x is to first seek the value of a that minimizes an objective of the form: 

f(a)^^-\\y-m)a\\\ + Ap{a). (3) 

The first term requires that the reconstructed image, after degradation, should be close to the measured image y. The 
second term forces sparsity using the penalty function p. The minimizer of this function is the representation of the 
desired image, and its multiplication by D gives the reconstructed image. Although intuitive, this strategy does not in 
general minimize the MSE. 

The minimization of /(or) in (O is typically performed using an iterative procedure. An appealing choice is the 
iterative shrinkage methods 10. Si- These algorithms are iterative, applying the matrices H and D and their adjoints, 
and an additional element-wise thresholding operation in each iteration. Consequently, these methods are well-suited 
for high-dimensional problems with sparse unknowns. This family of techniques includes the Separable Surrogate 
functionals (SSF) method |7], the Parallel Coordinate Descent (PCD) method |29, 24], and some others. In this work 
we focus on the SSF and PCD techniques. The SSF algorithm proposes the iteration formula 

ak^i = Sp,,ic (^(HD)^(y - HDorj) + ajj , (4) 

which is proven to converge f?'] to a local minimum of (O. In the case where p is convex, this is also the global 
minimum. The term Sp^^ic is a shrinkage operator dependant on p with threshold AJc. The constant c depends on the 
combined operator HD. The resulting estimator is 

X = hA,K{y) = ^OlK, (5) 

where K is the number of SSF iterations performed. When our task is minimizing the objective in Q, K is chosen 
to be the point that the decreasing rate is under a certain threshold. When aiming at minimization of the MSE, both 
A and K need to be tuned carefully: in many cases, from a certain iteration, the objective value continues to descend, 
while the MSE value increases. 

In iillTi an alternative iterated shrinkage approach is presented, which leads to faster convergence. In this 



technique, termed PCD, the iteration formula is given by 

ak+\ ^at- fi(\k - ak) (6) 

where 

V* = Sp,vvA (W(HD)^(y - HDa,) + or,) . (7) 

The diagonal matrix W contains the norms of the columns of HD, which are computed offline. Once is computed, 
a line search is performed along the trajectory - a^, using the parameter fi. This method is also proven to converge 
ll24il . Our estimator in this case is computed by (|5]l like in the SSF. 

In order to use the SSF and the PCD methods for the inverse problems mentioned, the penalty function p and the 
dictionary D need to be chosen as well as the A and the number of iterations. Following Elad et. al. ll24ll . we use the 
function 

p,(a) = |a|-ilog|l + ^) ie(0,(x,). (8) 



'The.se are defined as PSNR = 10 log,o (255^/MS£j and ISNR = 10 logm (M5£„//M5£). where MSE^.f is a reference-MSE to compare 
against. 
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This is a smoothed version of the /'p-norm in the range I < p < 2. The parameter s defines which ^p-norm we are 
approximating in this range. For small values of s (e.g., s - 0.001) this function serves as a smoothed version of 
the {i -norm, and we use it as the penalty function throughout this work. This choice of p dictates the shape of the 
shrinkage operator S ll24ll . 

There are various possibilities for choosing D. In our simulations, and following the work reported in iol 25, sll. 
we use the undecimated Haar wavelet with 3 layers of resolution. The motivation for this choice is that when looking 
for sparse representations, redundant dictionaries are favorable. 

The parameter A which determines the weight of the penalty function also needs setting, as do the number of 
iterations. Ideally, we would like to choose the parameters that minimize the MSB between the original image x and 
its estimate. However, as we have seen already, the dependence of the MSB on x precludes its direct minimization. 
We now turn to present several methods that aim to bypass this difliculty. 



3. Conventional Parameter Tuning methods 

The main approaches the literature offers for automatic parameter setting are the GCV [ 14], the L-Curve 1 15] and 
the discrepancy principle lfl6l] . For linear deconvolution methods, the parameter selection can also be based on the 
Unbiased Predictive Risk Bstimator (UPRB) l(l3ll : however, the algorithms we treat here are nonlinear 

In this section we accompany the description of these methods with an image deblurring experiment along the 
following lines; Working on the image cajneramain, we use the blur kernel 

kerl : h(xi, X2) ^ j^-^^, jci, JC2 = -7, 7 (9) 

and an additive white Gaussian noise with variance cr^ - 2. This experiment and its chosen parameters are adopted 
from Figueiredo ef. al. 

3.1. The Generalized Cross Validation ( GCV) Method 

For a reconstruction algorithm h,\x(:) and a degraded image y, we define the residual of the algorithm to be 

rh,^(y) = H/z^,^:(y) - y. (10) 

In the case where /ii,A:(0 is linear, the GCV method selects the parameters to be those that minimize the GCV func- 
tional given by 



GCW(A,K) 



«„ Il'"''-t.if(y)|l2 



ftrace(I-H/z,,jf (■))'' 



(11) 



where I is the identity matrix, and Wy is the length of y. In high dimensions, the trace in the denominator of (fTTT i is 
costly to compute directly. Stochastic estimation of this expression can be used instead [30]. Specifically, noting that 
E |^n^/!,ijf (n)j = trace (/ii,A:(-)) where n is a random vector n ~ A^(0, 1), we can approximate the trace by one such 
instance of the random vector. More than one instance of the random variable n can be used for better estimation of 
the trace by averaging, but in this work we use just one such realization of n. Replacing the trace in ( fTTT l yields 

Ik''.i,/f(y)|l2 

GCy{A,K)^- . (12) 

1 - J-nrH/7,,^(n)' 

y 

Some extensions of the GCV to non-linear cases were treated in ll3lll32il33i [lll. but as far as we know the literature 
does not offer extensions of the GCV to non-linear deconvolution algorithms like the one we address in this work. 
We suggest using ( fT2b as the GCV objective to be minimized for the parameters selection, despite the non-linearity of 

hA,K{-)- 
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Figure[T|presents an experiment using GCV. For fixed number of iterations, K - 43, the GCV as a function of A is 
presented in this figure. The value of A is chosen to be the minimizer of the GCV function. This minimal point can be 
found by a line search, but such a procedure may get trapped in a local minimum of the GCV curve, since the obtained 
curve is not necessarily unimodal. Here and in later figures, the value chosen based on the true MSB is marked by an 
asterisk. In this case, as can be seen, the GCV performs well. 
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Figure 1 : The GCV curve as a function of A for the PCD algorithm, with kerl as blur and with noise power of cr^ 
where K^A3. 



3.2. The L-curve method 

For setting a parameter with the L-curve method, a log of the squared norm of the result of the algorithm versus 
the squared norm of its residual is plotted over a range of parameter values. The result is typically a curve with an L 
shape. The L-curve criterion is to choose the parameter value that corresponds to the "corner" of the curve - the point 
with maximum curvature |i5|. For a parameter of interest A, the curve is defined by the x-axis X{A) - log ||'";iA(y)||2 



and the y-axis Y{A) - log ||/ii(y)||2. The curvature is then given by 



K{A) 



X"{A)Y'{A)-X'{A)Y"(A) 



{X'iAf + F'(i)2) 



2^3/2 



(13) 



The value of A is selected to maximize ( fTST l. Since the L-curve can only be sampled at specific discrete points, 
the calculation of the second order derivatives is very sensitive and thus should be avoided. Following Hansen and 
O'Leary |15], we apply a local smoothing on the L-curve points and use the new smoothed points as control points 
for a cubic spline. We then calculate the derivatives of the spline at those points and get the curvature on each of them. 
The smoothing is done by fitting five points, where the smoothed point is the center of them, to a straight line in the 
least squares sense. 

As in the case of the GCV, according to our knowledge, the L-Curve has not been applied to non-linear deconvolu- 
tion methods. Limitations of this approach for linear deconvolution are presented in 1 1, 34]. In addition, the curvature 
is very sensitive and thus we can very easily be diverted from the appropriate value. In addition to this shortcoming, 
each calculation of a curvature of a point needs the values of the L-curve of its neighbors. Therefore, it is hard to tune 
the parameters in an efficient way. 

Figure |2]presents the experiment results using the L-curve method. This figure shows both the L-curve (top) and 
its curvature (bottom) for a fixed A" = 43. The location of the chosen parameter A is marked by a diamond. As can be 
seen, the A value chosen as the curvature maximizer, in this case, is very close to the ideal value. 
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(a) L-curve (b) curvature of the L-curve 



Figure 2: The L-curve results for setting A for the PCD algorithm with kerl, cr^ = 2, and fixing K = 43. The L-curve 
is on the top and its curvature is on bottom. 



3.3. The Discrepancy Principle 

A very intuitive parameter tuning idea is to use the discrepancy principle. Based on the noise properties, the 
original image approximately satisfies ||Hx - y||^ ~ «ycr^. The discrepancy principle assumes that the same should 
hold for the reconstructed image and thus selects the parameters based on the assumption that 

\\lih^K(y)-y\\l^nyCT\ (14) 

The iteration number, K, is chosen to be the first iteration in which ||H/z/i:(y) - yllj < nycr^. 

Experimenting with the discrepancy principle, Fig.[3]presents the value of |||H/zjf (y) - yllj - nyO-^\ as a function of 
K, where A = 0.065 is fixed. The discrepancy principle selects the parameter value that achieves the minimal value on 
the graphs. As can be seen, although this method is very appealing, in many instances like this one, it tends to select 
a value of K far from the optimal one. 

In this section we have described three parameter tuning methods: The GCV, the L-curve and the discrepancy 
principle. The discrepancy principle, which is commonly used as a stopping criterion, tends to fail in many cases, 
as can be seen in this section. As for the GCV and the L-curve methods, as far as we know, they were not applied 
before to non-linear deconvolution algorithms of the kind we use here. While their deployment is reasonable, their 
performance tends to be overly sensitive, possibly leading to poor results, as will be shown in the results section. 
The fact that these three methods may give good parameter tuning in part of the cases may be explained by the fact 
that these methods do not aim at minimizing the MSB but use different considerations for the tuning. This leads us 
naturally to consider ways to select the parameters directly based on an estimated value of the MSB. 



4. GSURE and Projected GSURE 

In this section we discuss an unbiased estimator for the MSB obtained by any reconstruction algorithm. This 
MSB estimate will later be used for parameter selection. The Stein unbiased risk estimator (SURB) was proposed in 
lfl7 . 18 1 as an estimate of the MSB in the Gaussian denoising problem, namely, when H = I and the noise is i.i.d. and 
Gaussian. This idea was then extended in [23] to handle more general inverse problems of the form discussed here, 
and a wider class of noise distributions. We refer to the later MSB estimate as the generalized SURB (GSURB). 
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Figure 3: The discrepancy principle for selecting the number of iterations for the PCD algorithm with kerl, cr^ 
and fixing A - 0.065. 



The essential concept behind the GSURE principle is to seek an unbiased estimate for the MSB that is only a 
function of the estimate h(y) and y (up to a constant which may depend on x but is independent of h{y)). Denote such 
an MSB estimate by ?7(x, y). Then, 

£[/7(x,y)] =£f||x-x||2] =£[||/z(y)-x||2]. (15) 

If we have access to such a function rjix, y) that estimates the MSB, while being also dependent on a set of parameters 
© (that control the reconstruction performance), then we may choose the values of so as to minimize this function, 
and in that way aim at minimizing the true MSB as a consequence. 



For the case where H has full column rank, the GSURB is given by Eldar II23I1 



jj(h(u), y) = c, + \\h(u)\\l - 2xlji(u) + 2V„ ■ h(u). (16) 
Here, u - (l/cr^)H^y is the sufficient statistic for the model ([T]l, V is the divergence operator 

T7 \ dh(ii)[i] (dh(u)\ 

and C] is a constant independent of /i(u). Given that our goal is to minimize the MSB by setting h{u), the constant c\ 
is irrelevant for the minimization process. The term Xml stands for the Maximum Likelihood (ML) estimator — the 
minimizer of 

min||y-Hz||. (18) 

z 

If H^H is invertible, then this estimate is given by 

XML = (H^H)-iH^y. (19) 

The MSB estimate presented in (fTST i is valid only if H^H is invertible. When this condition is not satisfied, H 
has a nontrivial null space. Therefore, we have no information about the components of x that lie in the null space 
AA(H). Thus we can estimate only the projected MSB on the range of (denoted by !R(H^)), which is equal to the 
orthogonal complement of the null space AA(H)-'-. Therefore, instead of attempting to minimize an unbiased estimate 
of the MSB, we consider an unbiased estimate of the projected MSB: E |^P^(Hr)p(u) - xHjj, where 



Pk(HO = H^(HH^)tH, 
7 



(20) 



is the orthogonal projection onto '??(H^) = N(H)^. Here f denotes the Moore-Pernose pseudo inverse. 

As shown in [23], an unbiased estimator for the projected MSE is given by the same formula as ( fTSI l. but applied 
on the projected estimate /i(u): 

'7P«,„7-,W")>3') = C2 + \\PK(W)h(»)\\l-'2xlj^h(u) 

+2V„ ■ (P«(H.)/!(u)). (21) 

Again, C2 is a constant independent of h(y). Here the ML estimate is chosen as the minimum-norm minimizer of (fTsT l, 
given by 

XML = H^(HH^)V. (22) 



5. Parameter Tuning with GSURE 

The generalization of SURE provides the possibility of using this estimator for the case of general H, and specif- 
ically, for the deblurring and image scale-up problems. The tuning will be demonstrated for the SSF and PCD algo- 
rithms. As seen in Section |2] these methods have mainlyjtwo unknown parameters, the iterations number K and the 
thresholding value A. Vonesch, Ramani and Unser, in |26], have already addressed this problem for the case where 
the operator H has full column rank. In their method, referred to as the global method, one parameter is being set at 
a time, given the other. We extend this for general H using the projected GSURE, showing the advantage of its usage 
over the regular GSURE, and over the conventional parameter setting methods that were presented in Section |3] 

Though the global method for parameter setting provides us with a good estimation of the parameters as demon- 
strated in the results section, it sets only one parameter given the other. We present an additional approach that sets 
both together. The parameter A takes on a different value in each iteration of the algorithm in a greedy fashion. This 
way, we choose K together with A, setting it to be the point where the estimated MSE stops decreasing. This method 
has the same computational cost for choosing two parameters as the global method has for selecting only one, without 
degrading the MSE performance. We also suggest a greedy look-ahead method that aims at better reconstruction re- 
sults at the expense of higher complexity. We compare between these algorithms and the global method demonstrating 
the advantage of using these approaches for parameter tuning when both A and K are unknown. 

5.1. The Global Method using GSURE 

In order to use GSURE with the iterated shrinkage algorithms, described in Section |2] we need to calculate the 
ML estimator, the derivative of the reconstruction algorithm with respect to u, and the projection operator in the rank- 
deficient case. In our experiments, the ML estimator and the projection matrix are easily calculated by exploiting the 
fact that the blur operators considered are block circulant. As such, they are diagonalized by the 2D discrete Fourier 
transform (DFT). This helps both for the deblurring and the scale-up problems we handle. 

To calculate the derivatives with respect to u, we first reformulate the iterative equation as a function of u instead 
of y. Rewriting (|4]i leads to 

a^+i = Sp^Aic {^-(Wio-'^yy - H^HDff,) + , (23) 
where x = /z.i,a:(u) = \)aK as in (|5]i. The derivatives of x are calculated recursively by first noticing that 



^D^ - i(HD)-HD^ + ^ 

c c du du 



(24) 



where S'^ _^^^{■) is an element wise derivative of the shrinkage function, organized as a diagonal matrix. From here, 
the divergence of the estimator can be directly obtained by multiplying the above by D from the left, gathering the 
diagonal of the matrix and summing it up; 

V„/.,,,(u) = trace = trace (d^). (25) 



It is impracticable to calculate the Jacobian of ( l24l i in high dimensions. Instead we use the trace estimator that was 
used for the GCV in Section lTTI Following Vonesch et. al. fl^ we choose a random vector n ~ A^(0, 1). Using this 
estimator we iterate over , which can be calculated instead of leading to the following iterative formula 



du 



1 



dak dak 



-D^n--(HDfHD^n+^n 

c c du du 



(26) 



For the estimation of the MSE for the PCD algorithm we need to calculate the trace as was done for the SSF. We 
rewrite the PCD iteration in terms of u 



oik+i = ak+ ju(vt - ak) 

= (l-yU)Q't 

+ju {Sp,^A (WD^(cr2u - H^HDa,) + a,)) . 



(27) 



The derivative with respect to u is given by 

du 



(1 -f^) — 
du 



+fiS'p y,, {WD^{cr\ - H^HDffi) + a^) ■ 

WDV^I-H^HD^) + ^ 
du du 



Using randomization we get the following iterative formula for estimating the trace value, 

dak+i dak 
du du 

(WD^(cr2u - H^HDa,) + a,) ■ 

WDV^n-H-HD^n)+^n 
du du 



(28) 



(29) 



Now that we hold a complete expression for the GSURE MSE estimate, K and A can be chosen by minimizing 
it. This strategy was developed in [1261] and is referred to as the global method. In the full-rank case, the projection 
matrix is the identity and the projected GSURE coincides with the regular GSURE. Thus we can use the projected 
GSURE equation in (|2TI) for both cases. For a fixed and pre-chosen number of iterations K, the A which minimizes 
the GSURE expression is chosen. Repeating this process for various values of K, one can minimize the overall global 
MSE with respect to both K and A. As an analytical minimization of the GSURE is hard to achieve, we use a golden 
section search lissll . We initiahze these algorithms by 



ao 



(HD)^y, 



(30) 



and 



dao 
du 



= (r D 



(31) 



Denoting by T the GSURE calculation time (per-iteration), rigs the golden-section number of iterations, and «,j 
the number of iterations of the iterated shrinkage algorithm, the time complexity of the global method for setting A, 
when K is fixed, is OinisrigsT). For setting K another golden section can be performed over the number of iterations. 



9 



5.2. The Greedy Method using GSURE 

In the greedy method, instead of choosing one global A for all the iterations, we take a different route. Turning to 
a local alternative, the value of A can be determined as the minimizer of the estimated MSE of the current iteration. 
In this strategy, K is set together with A as the point where the estimated MSE (almost) stops decreasing. This allows 
us to avoid setting each of the parameters separately, as done in the global method. The algorithm proposed is the 
following: 

• Initialize and calculate its derivative w.r.t. u. 

• Repeat: 

1. Setxt* = argmin,i(77p^^^^^(/!^(u,Q'^_i))). 

2. Perform the iterations in ( l27l i and ( [29] l (PCD case) or (l23T l and (|26] | (SSF case) for the calculation of 
and using /l^. 

3. Compute 

MSE; = 77P„ {Dak, n^D^^^^n, Xml) using (EB, ©, (ES, ^ and ^ or 

4. If MSE*_i - MSE* < 5, stop. 

• X = Dart. 

The complexity of the greedy algorithm is also OinisrigsT). When one of the parameters is fixed in the global 
method and we need to set only the other parameter we get the same complexity in the two approaches. In the case 
where both parameters are to be set, the greedy technique has an advantage since it chooses the number of iterations 
along with the parameter A. In contrast, in the global method the number of iterations is either pre-chosen and thus 
suboptimal, or searched using yet another golden-search for optimizing K increasing the overall complexity. 

To further decrease the MSE, we can modify the greedy algorithm by introducing a look-ahead option. One of 
the problems of the greedy strategy is that it minimizes the MSE of the current iteration but can harm the overall 
MSE. Thus, instead of choosing A that minimizes the estimated MSE of the current iteration, we select it to minimize 
the estimated MSE of r iterations ahead, assuming that these r iterations are performed using the greedy approach 
described above. This change provides a look-ahead of r iterations, formulated as the following algorithm: 

• Initialize oro and calculate its derivative w.rt. u. 

• Repeat: 

1. Set A*^. by minimizing the estimated MSE r iterations ahead using the above-described greedy algorithm. 

2. Perform a single iteration as in (l27b and ( |29] l (PCD case) or (|23] ) and ( |26] | (SSF case) for the calculation of 
ak and using /I*. 

3. Compute 

MSE* = ?7p„ (Dor/f , n^D^^n, Xml) using (EB, ©, ^ and ^ or 

4. If MSE;_, -MSE^* < (5, stop. 

• X = Dofi. 

In step 1, for each test A in the golden-section, r iterations of the greedy method are performed, as described for the 
greedy algorithm. Finally, A of the current iteration is chosen such that the estimated MSE of the last rth greedy 
iteration is minimized. The time complexity of the r look-ahead greedy algorithm is n,j(n^j)^rr, which is n^^r times 
slower than the other two methods. However, this is partially compensated for in some cases by getting smaller 
due to faster convergence. Furthermore, the resulting MSE is often lower, as we illustrate in the results section. 
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6. Results 



The greedy methods were previously presented, with preliminary results on low dimension signals, in 1 36] . In this 
work we extended their use for images. In this section we present a set of experiments that demonstrate parameter 
tuning for the deblurring and the scale-up problems, using the PCD and the SSF algorithms. The representative 
experiments chosen correspond to different scenarios, which enable a demonstration of the various algorithms and 
their behavior We should note that our work included various more tests, which are not brought here because of space 
limitation, and which lead to the same final conclusions. Table [T]lists the five tests performed. The tests we present 
divide into two groups: (i) preliminary ones that demonstrate the superiority of the projected GSURE over all the 
other alternatives, and then (ii) advanced tests that compare the global and the greedy methods. 



Problem 


Blur 




Decimation 


Algorithm 


Cond-# 


1. Deblurring 


kerl : {I + + x^^y^ 

-1 < Xl,X2 <1 


2 or 8 


none 


PCD 


77 


2. Deblurring 


kerl : 1/81 
-4 < xi,X2 < 4 


0.308 


none 


PCD 


2.5e + 5 


3. Deblurring 


ker3 : [1,4,6,4,1] 
separable 


49 


none 


PCD 


oo 


4. Scale-Up 


kerA : [1,2, l]/4 
separable 


49 


2 : 1 
in each axis 


SSF 


oo 


5. Scale-Up 


kerS : [1, l]/2 
separable 


16 


2 : 1 
in each axis 


SSF 


oo 



Table 1 : A list of the experiments presented. 



6.1. Preliminary Tests 

We start by demonstrating the core ability of the GSURE and the classical techniques to choose the required 
parameters. In Fig. |4]we present the GSURE ISNR estimatioiJl for Problem-1 applied on Cameraman with cr^ - 2 
as a function of the iterations. Similarly, Fig. |5]presents the GSURE ISNR as a function of A for the same problem 
with cr^ = 8. In both cases, the other parameter is kept fixed, choosing it to be the optimal one based on the true 
MSE. It can be observed in both graphs that the GSURE gives a very good estimation for the MSE and thus also a 
high-quality estimate of both A and the number of iterations K. It can also be seen that it selects the parameters to be 
much closer to the optimal selection, compared to all the conventional methods. Among these conventional methods, 
the L-curve gives the best approximation for the parameters. However, it is hard to compute in an efficient way. The 
other methods approximate A relatively well but fail in choosing K. 

Repeating the experiment reported in Fig.|4]on the image Boat and fixing A - 0.0725, Table |2] summarizes the 
ISNR reconstruction results for setting K. It can be seen again that the GSURE gives the best reconstruction results, 
being approximately the same as those achieved by the optimal selection. 

In Figs. |6]and|7]a comparison is made between the ISNR estimation using the regular and the projected GSURE. 
The problem considered in the first graph is Problem-2 and in the second is Problem-3, where both are applied on 
Csrnieraman. The selected parameters of the various methods are marked. The first graph is a function of A and the 
second is a function of the iterations K. It can be seen that for both operators the estimation of A and the iterations 
number using the projected GSURE is reasonable and always better than using the regular GSURE. Also, we see 
that in the case where we have a blur with real null space we get a totally unreliable estimation of the ISNR with the 
regular GSURE. We also see that L-curve fails this time in estimating A. 

Table |3] compares the various parameter setting methods for a deblurring experiment based on Problem-2 for 
reconstructing the image Peppers when A is being tuned. The visual effect of the various parameter selections can 
be observed in Fig. [8] We see the same behavior as before, indicating that the projected GSURE is the best choice. 



^The reference MSE is obtained by considering y as the estimate. 
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True ISNR 

Estimated ISNR 

optimal selection 

O GSURE seiection 
discrepancy selection 
GCV setting seiection 

^ L-curve setting seiection 



20 



40 60 80 

Iteration number 



100 



120 



Figure 4: The GSURE-estimated and the true ISNR as a function of the iterations number for Problem- 1 with cr^ -2 
and fixing A = 0.065. The selection of the iterations number based on the various discussed methods is marked. 
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Figure 5: The GSURE-estimated and the true ISNR as a function of A for Problem-1 with cr^ = 8 and fixing K - 27. 
The selection of A based on the various discussed methods is marked. 



Parameter Tuning Method 


ISNR 


Optimal Selection 


6.46 IdB 


GSURE 


6.458dB 


L-curve 


6.43dB 


GCV 


5.21dB 


Discrepancy Principle 


5.47dB 



Table 2: ISNR deblurring results for Problem-1 on Boat with cr^ = 2, fixing A - 0.0725 and setting K using various 
methods. 
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True ISNR 

Estimated ISNR 

■5|f optimal selection 
O GSURE seiection 
X discrepancy seiection 
□ GCV setting selection 
^ L-curve setting selection 
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True ISNR 

GSURE Estimated ISNR 

-^Projected GSURE Estimated ISNR 

^ optimal selection 

O GSURE selection 

+ projected GSURE selection 

X discrepancy selection 

□ GOV setting selection 

^ L-curve setting selection 



0.05 0.1 0.15 0.2 

X 

Figure 6: The true and estimated ISNR as a function of A, using the GSURE and its projected version. Applied on 
Problem-2, the iterations number is set to K - 150, and A is estimated using the various proposed methods. 
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Figure 7: The true and estimated ISNR as a function of K, using the GSURE and its projected version. Applied on 
Problem-3, A is set to A - 0.67, and K is estimated using the various proposed methods. 
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leading to the best reconstruction results. The results again show that the regular GSURE (and the conventional 
methods) tend to fail, while the projected version performs very well. 



Parameter Tuning Method 


ISNR 


optimal selection 


8.86dB 


projected GSURE 


8.86dB 


discrepancy principle 


8.28dB 


GSURE 


8.17dB 


GCV 


8.1dB 


L-curve 


7.51dB 



Table 3: The deblurring results for Problem-2 applied on Peppers with cr^ - 2, fixing K = 92, and setting A using 
the different parameter tuning techniques. 

In Figs. |9] and [To] a comparison is made between the estimation of the PSNR using the regular and the projected 
GSURE for Problem 4 (scale-up). As before, a comparison is done between the parameter values selected by the 
various methods. In addition, we show the reconstruction quality achieved by the bicubic, the Lanczos, and the 
bilinear interpolation method^ as a reference performance to compare against. 

The first graph is a function of the iterations K and the second is a function of A. It can be seen that the reconstruc- 
tion algorithm with correctly tuned parameters achieves better reconstruction results than the conventional scale-up 
techniques, demonstrating the importance of the parameter tuning. Here as well, it can be observed that the projected 
GSURE provides the best estimation for the selection of A and K. 

As in the case of deblurring, the scale-up results show that the projected GSURE gives the best reconstruction 
results. Because of its superiority, we shall use it hereafter in the various experiments that follow. Since in the case 
of a fuU rank operator, the GSURE and projected GSURE coincide, in every place that GSURE is referenced, the 
projected GSURE would be the one being intended. 

6.2. Advanced Tests 

We now turn to look at the greedy strategies and compare them to the global method, all aiming to jointly estimate 
K and A. In Fig.[TT]we present the results obtained for Problem-1 applied on Cameraman. The graph shows the true 
ISNR after each iteration of the greedy and the greedy look-ahead methods (with r = 1), together with their ISNR 
estimation. As can be seen, the estimation has the same behavior as the true ISNR. In particular, in both graphs the 
maxima are very close. Choosing the number of iterations based on GSURE yields an ISNR very close to the one 
based on the true ISNR. This justifies our claim about using the greedy algorithm as an automatic stopping rule, by 
detecting an increase in the ISNR. The maxima in both graphs (marked in a circle as before) are very close, both in the 
greedy, as well as in the greedy look-ahead case. Figure[T2]presents the actual result obtained for a similar experiment 
applied on the image House with Problem-2. 

A comparison of the MSE of the global, the greedy, and the look-ahead techniques for the deblurring problem in 
images is presented in Fig.[T3]for Problems-1, 2, and 3 on the image Cameraman. In all cases, no real advantage is 
seen in terms of ISNR by each of the techniques. Since the greedy algorithm sets the iterations number together with 
A, this means that the greedy methods are more efficient in the overall performance, at a possible sUght costs in terms 
of the final ISNR. 

Turning to the scale-up problems 4 and 5, the same comparison is done in Fig. [14] Here again we get similar results 
in terms of PSNR for the three methods. This implies that the two approaches (global and greedy) are equivalent in 
terms of the output quality, and the differences are mostly in their complexities, with a clear advantage to the greedy 
techniques. 

To summarize, for the various deblurring and scale-up tests, there is no real advantage to one of the methods over 
the other, in terms of the quality of the reconstruction results. Thus, when we need to set both A and K, the greedy 
methods are to be favored. 



^these scale-up techniques do not require parameter tuning. 
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(d) GCV selection. ISNR = 8.05 (e) L-curve selection. ISNR = 7.64 (f) discrepancy principle selection. 
dB. dB. ISNR = 8.21 dB. 




Figure 8: The reconstruction result for Peppers for Problem-2, wiien K - 92 and A is being tuned based on the 
different parameter tuning techniques. Form top left to bottom right: original image, blurred image, tuning using true 
ISNR, GCV, L-curve, discrepancy principle, GSURE and projected GSURE. 
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Figure 9: The true and estimated PSNR as a function of K, using the GSURE and its projected version. Applied on 
Problem-4, A is set to be /l = 0.5, and K is estimated using the various proposed methods. 
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Figure 10: The true and estimated PSNR as a function of A, using the GSURE and its projected version. Applied on 
Problem-4, K is set to be = 27, and A is estimated using the various proposed methods. 
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Figure 1 1 : The GSURE-estimated and the true ISNR as a function of the iteration number for the greedy algorithm 
(top) and for the greedy look-ahead algorithm with r - I (bottom) for Problem-!. 




(a) original image. (b) blurred image. (c) greedy look-ahead method with r = 1. 

ISNR = 8.65dB. 



Figure 12: The reconstruction result for House for Problem-2, where A is set based on the greedy look-ahead method 
with r = 1, and K is set as a stopping rule. 
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Iteration number Iteration number Iteration number 

(a) Problem-1 and cr^ = 8. (b) Problem-2. (c) Problem-3. 

Figure 1 3 : The true ISNR as a function of the iterations number for the global, greedy, and look-ahead-greedy methods 
for Problems 1, 2 and 3 with different kernels and noise power 




Figure 14: The true PSNR as a function of the iterations number for the global, greedy, and greedy look-ahead 
methods for Problems 4 and 5. 
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7. Conclusion and Discussion 



In this work we have considered automatic tuning of parameters for inverse problems. Our focus has been iterated 
shrinkage algorithms for deblurring and image scale-up, targeting an automatic way for choosing A in each iteration 
in these methods, and choosing the number of iterations K. We extended the global tuning method developed in 
ll26ll to general ill-posed problems, by exploiting a projected version of the GSURE. We also applied the GCV, the 
L-curve, and the discrepancy-principle methods for this task of parameter tuning, and compared their selection with 
those of the (projected) GSURE. The GSURE was shown to give better approximation for the true values of the tuned 
parameters, leading to better results in the reconstruction. 

Two greedy methods for parameter tuning - a simple and a look-ahead version - were presented. These two 
methods are shown to perform as well as the global method, with the advantage of setting an automatic stopping rule 
together with the A parameter, whereas the previous methods set only one of the parameters, given the other 

We are aware of the fact that there are better algorithms today for image deblurring and scale-up. However, the 
scope of this paper is to demonstrate the applicability of the GSURE for parameter tuning for such methods and not 
to compete with current state of the art reconstruction results. Using the concepts presented in this paper, automatic 
parameter tuning can be performed also for other techniques. 
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